[1] ".GlobalEnv"           "package:readxl"       "package:gt"           "package:plotly"      
 [5] "package:pls"          "package:glmnet"       "package:Matrix"       "package:class"       
 [9] "package:nnet"         "package:pROC"         "package:boot"         "package:leaps"       
[13] "package:car"          "package:carData"      "package:faraway"      "package:ISLR"        
[17] "package:FNN"          "package:vcd"          "package:grid"         "package:fitdistrplus"
[21] "package:survival"     "package:MASS"         "package:stats4"       "package:haven"       
[25] "package:pracma"       "package:matlib"       "package:forcats"      "package:stringr"     
[29] "package:dplyr"        "package:purrr"        "package:readr"        "package:tidyr"       
[33] "package:tibble"       "package:ggplot2"      "package:tidyverse"    "package:knitr"       
[37] "package:stats"        "package:graphics"     "package:grDevices"    "package:utils"       
[41] "package:datasets"     "package:methods"      "Autoloads"            "package:base"        

1 PREDICITON OF ALL NBA TEAM

1.1 1. SUMMARY: WHAT DO WE WANT?

ALL NBA TEAM of the Year is an annual NBA award given to the best players of the season. Voting is by a group of sports journalists and announcers from the United States and Canada. The team has been chosen in each NBA season, since its inauguration in 1946. The award consists of three quintets consisting of a total of 15 players, five on each team. It originally consisted of two teams, but in 1988 it was increased to three.

Players receive five points for each vote on the first team, three points for each vote on the second team, and one point for each vote on the third team. The five players with the highest total number of points enter the first team, with the next five players integrating the second team and the same with the third. There is a position restriction. In each voting of 5 players (of each quintet), 2 players are voted that are guard, in our data.frame “PG” and “SG”; the other 2 being forward, “SF” and “PF” and the last player being center “C”. They are basically the top 15 players of the season. We will look at the statistics made by these 15 players (in all seasons), and with this information, having the statistics of all the players of the last 35 seasons, we will try to know which will be in the quintet and which not in recent years.

1.1.1 OBJECTIVE:

Our goal is to make a model that is able to predict these 15 players. To do this we need to make a multiple logistic regression model where, according to each player’s statistics, the model gives you a chance to belong to the best quintets. We need to keep in mind the restriction that only certain players in each position can have. We have a database, explained more specifically in point 2. INTRODUCTION TO THE DATABASE where we will have all the information to create this model. We will check the results obtained by the model in some of the last seasons where we already know the results, then we will test it with the statistics we have so far to predict this year’s quintets.

1.2 2. DATA:

Our information for making this model is divided into two databases: - A database where we have all the information from 1980 to 2017 with all the player statistics for each season. We also introduce the quintetO variable that will provide us with information on whether or not the player is in the quintet of the season. - And another where we have the information of the data of the current season. In the first database we have the statistics of each player in a given season. The data have the following variables:

|Variables explained in the document|

1.3 3. PROCESS

We will divide the process into two parts:

  • On the one hand we have the creation of the model with a database that contains all the statistics from 1980 to 2017. In this, we will predict the quintets of the years 2015, 2016 and 2017. In the study we will be able to check which players have predicted correctly, and which ones don’t.

  • In the second part we will take the data available this year from the competition to the stop due to the COVID-19. In this one we will not be able to check if the model hits the players or not as this prize has not been awarded yet. We will also work on a data.frame that does not contain the same variables (although it is similar) so we will have to create another model.

1.3.1 3.1. MODEL CREATION PROCESS WITH KNOWN DATA (1980 - 2017)

The steps we have followed are as follows:

  • Reading and cleaning the dataframe (reading the csv, setting the variable type, changing null values to 0, deleting rows with duplicate players…)
#Read csv file
library(readr)
Seasons_Stats <- read_csv("Seasons_Stats.csv",
     col_types = cols(`2P%` = col_number(), 
         `3P` = col_double(), `3PAr` = col_number(), 
         `AST%` = col_number(), BLK = col_double(), 
         `BLK%` = col_number(), BPM = col_double(), 
         DBPM = col_double(), DRB = col_double(), 
         `DRB%` = col_number(), `FT%` = col_number(), 
         FTr = col_number(), GS = col_double(), 
         MP = col_double(), OBPM = col_double(), 
         ORB = col_double(), `ORB%` = col_number(), 
         PER = col_double(), STL = col_double(), 
         `STL%` = col_number(), `TOV%` = col_number(),`3P%` = col_double() ,
         `3PA` = col_double(), `TOV` = col_double(),
         TRB = col_double(), `TRB%` = col_number(), 
         `TS%` = col_number(), `USG%` = col_number(), 
         VORP = col_double(), `WS/48` = col_number(), 
         X1 = col_skip(), Year = col_integer(), 
         blank2 = col_skip(), blanl = col_skip(), 
         `eFG%` = col_number()))
#We select the data from 80s, when three-point shot is introduced in the NBA.
library(tidyverse)
Seasons_Stats2 <- Seasons_Stats %>% subset(Year > 1979)

1.3.2 Players in the all nba team

#What we want to do is find which players are most likely to be in the best quintet. We will do this based on a training set and a test set.To do this, we will then nail the all_nba team (quinteto) variable, followd by the year (quinteto_year), which will have 1 (if the player is in a better all nba team) and 0 if it is not. In this way, we enter the data, and then we enter the column.

#Those players who have a * in the database are those who have been elected to the Basketball Hall of Fame. We also have it in the database.

quinteto80 <- c("Julius Erving*","Dan Roundfield","Larry Bird*","Marques Johnson","Kareem Abdul-Jabbar*","Moses Malone*","George Gervin*","Dennis Johnson*","Paul Westphal","Gus Williams")

quinteto81 <- c("Julius Erving*","Marques Johnson", "Larry Bird*","Adrian Dantley*","Kareem Abdul-Jabbar*","Moses Malone*","George Gervin*","Otis Birdsong","Dennis Johnson*","Tiny Archibald*")

quinteto82 <- c("Larry Bird*","Alex English*","Julius Erving*","Bernard King*","Moses Malone*","Robert Parish*", "George Gervin*","Magic Johnson*", "Gus Williams","Sidney Moncrief")

quinteto83 <- c("Larry Bird*","Alex English*" ,"Julius Erving*","Buck Williams","Moses Malone*","Kareem Abdul-Jabbar*", "Magic Johnson*","George Gervin*","Sidney Moncrief","Isiah Thomas*")

quinteto84 <- c("Larry Bird*","Adrian Dantley*","Bernard King*" ,"Julius Erving*","Kareem Abdul-Jabbar*","Moses Malone*" ,"Magic Johnson*","Sidney Moncrief" ,"Isiah Thomas*","Jim Paxson")

quinteto85 <- c("Larry Bird*","Terry Cummings","Bernard King*","Ralph Sampson*","Moses Malone*", "Kareem Abdul-Jabbar*","Magic Johnson*","Michael Jordan*","Isiah Thomas*","Sidney Moncrief")

quinteto86 <- c("Larry Bird*","Charles Barkley*","Dominique Wilkins*","Alex English*","Kareem Abdul-Jabbar*","Hakeem Olajuwon*","Magic Johnson*","Sidney Moncrief","Isiah Thomas*","Alvin Robertson")

quinteto87 <- c("Larry Bird*" ,"Dominique Wilkins*","Kevin McHale*","Charles Barkley*","Hakeem Olajuwon*", "Moses Malone*","Magic Johnson*","Isiah Thomas*","Michael Jordan*","Fat Lever")

quinteto88 <- c("Larry Bird*","Karl Malone*","Charles Barkley*","Dominique Wilkins*","Hakeem Olajuwon*", "Patrick Ewing*","Michael Jordan*","Clyde Drexler*","Magic Johnson*","John Stockton*")

quinteto89 <- c("Karl Malone*","Tom Chambers","Dominique Wilkins*", "Charles Barkley*","Chris Mullin*", "Terry Cummings","Hakeem Olajuwon*","Patrick Ewing*","Robert Parish*","Michael Jordan*" ,"John Stockton*","Dale Ellis","Magic Johnson*","Kevin Johnson","Mark Price")

quinteto90 <- c("Karl Malone*","Larry Bird*","James Worthy*","Charles Barkley*","Tom Chambers","Chris Mullin*","Patrick Ewing*","Hakeem Olajuwon*","David Robinson*","Magic Johnson*","John Stockton*","Clyde Drexler*","Michael Jordan*","Kevin Johnson","Joe Dumars*")

quinteto91 <- c("Karl Malone*","Dominique Wilkins*","James Worthy*","Charles Barkley*","Chris Mullin*","Bernard King*","David Robinson*","Patrick Ewing*","Hakeem Olajuwon*","Michael Jordan*" ,"Kevin Johnson","John Stockton*","Magic Johnson*","Clyde Drexler*","Joe Dumars*")

quinteto92 <- c("Karl Malone*","Scottie Pippen*","Dennis Rodman*","Chris Mullin*","Charles Barkley*","Kevin Willis","David Robinson*","Patrick Ewing*","Brad Daugherty","Michael Jordan*","Tim Hardaway","Mark Price", "Clyde Drexler*","John Stockton*","Kevin Johnson")

quinteto93 <- c("Charles Barkley*","Dominique Wilkins*","Scottie Pippen*","Karl Malone*","Larry Johnson", "Derrick Coleman","Hakeem Olajuwon*","Patrick Ewing*","David Robinson*","Michael Jordan*","John Stockton*","Tim Hardaway","Mark Price","Joe Dumars*","Drazen Petrovic*")

quinteto94 <- c("Scottie Pippen*","Shawn Kemp","Derrick Coleman","Karl Malone*","Charles Barkley*","Dominique Wilkins*","Hakeem Olajuwon*","David Robinson*","Shaquille O'Neal*","John Stockton*","Mitch Richmond*","Mark Price","Latrell Sprewell","Kevin Johnson","Gary Payton*")

quinteto95 <- c("Karl Malone*","Charles Barkley*","Detlef Schrempf","Scottie Pippen*","Shawn Kemp","Dennis Rodman*","David Robinson*","Shaquille O'Neal*","Hakeem Olajuwon*","John Stockton*","Gary Payton*","Reggie Miller*","Anfernee Hardaway","Mitch Richmond*","Clyde Drexler*")

quinteto96 <- c("Scottie Pippen*","Shawn Kemp","Charles Barkley*","Karl Malone*","Grant Hill","Juwan Howard","David Robinson*","Hakeem Olajuwon*","Shaquille O'Neal*","Michael Jordan*","Gary Payton*","Mitch Richmond*","Anfernee Hardaway","John Stockton*" ,"Reggie Miller*")

quinteto97 <- c("Karl Malone*" ,"Scottie Pippen*" ,"Anthony Mason","Grant Hill","Glen Rice","Vin Baker", "Hakeem Olajuwon*","Patrick Ewing*","Shaquille O'Neal*" ,"Michael Jordan*" ,"Gary Payton*","John Stockton*","Tim Hardaway","Mitch Richmond*" ,"Anfernee Hardaway")

quinteto98 <- c("Karl Malone*","Grant Hill","Scottie Pippen*","Tim Duncan","Vin Baker","Glen Rice","Shaquille O'Neal*","David Robinson*","Dikembe Mutombo*","Michael Jordan*","Tim Hardaway","Mitch Richmond*" ,"Gary Payton*","Rod Strickland","Reggie Miller*")

quinteto99 <- c("Karl Malone*","Chris Webber" ,"Kevin Garnett","Tim Duncan", "Grant Hill" ,"Antonio McDyess" ,"Alonzo Mourning*","Shaquille O'Neal*","Hakeem Olajuwon*","Allen Iverson*","Gary Payton*","Kobe Bryant","Jason Kidd","Tim Hardaway","John Stockton*")

quinteto00 <- c("Tim Duncan","Karl Malone*","Chris Webber","Kevin Garnett","Grant Hill","Vince Carter","Shaquille O'Neal*","Alonzo Mourning*" ,"David Robinson*","Jason Kidd","Allen Iverson*","Eddie Jones","Gary Payton*","Kobe Bryant","Stephon Marbury")

quinteto01 <- c("Tim Duncan","Kevin Garnett","Karl Malone*","Chris Webber","Vince Carter","Dirk Nowitzki","Shaquille O'Neal*","Dikembe Mutombo*","David Robinson*","Allen Iverson*","Kobe Bryant","Gary Payton*" ,"Jason Kidd","Tracy McGrady","Ray Allen")

quinteto02 <- c("Tim Duncan","Kevin Garnett","Ben Wallace","Tracy McGrady","Chris Webber","Jermaine O'Neal","Shaquille O'Neal*","Dirk Nowitzki","Dikembe Mutombo*","Jason Kidd","Gary Payton*","Paul Pierce", "Kobe Bryant","Allen Iverson*","Steve Nash")

quinteto03 <- c("Tim Duncan","Dirk Nowitzki","Paul Pierce","Kevin Garnett","Chris Webber" ,"Jamal Mashburn","Shaquille O'Neal*","Ben Wallace","Jermaine O'Neal","Kobe Bryant","Jason Kidd","Stephon Marbury","Tracy McGrady","Allen Iverson*","Steve Nash")

quinteto04 <- c("Kevin Garnett","Jermaine O'Neal","Dirk Nowitzki","Tim Duncan","Peja Stojakovicn","Ron Artest","Shaquille O'Neal*","Ben Wallace","Yao Ming*","Kobe Bryant","Sam Cassell","Michael Redd","Jason Kidd","Tracy McGrady","Baron Davis")

quinteto05 <- c("Tim Duncan","LeBron James","Tracy McGrady","Dirk Nowitzki","Kevin Garnett","Shawn Marion", "Shaquille O'Neal*","Amar'e Stoudemire","Ben Wallace","Allen Iverson*","Dwyane Wade","Kobe Bryant","Steve Nash","Ray Allen","Gilbert Arenas")

quinteto06 <- c("LeBron James","Elton Brand","Shawn Marion","Dirk Nowitzki","Tim Duncan","Carmelo Anthony", "Shaquille O'Neal*","Ben Wallace","Yao Ming*","Kobe Bryant","Chauncey Billups","Allen Iverson*","Steve Nash","Dwyane Wade","Gilbert Arenas")

quinteto07 <- c("Dirk Nowitzki","LeBron James","Kevin Garnett","Tim Duncan","Chris Bosh","Carmelo Anthony","Amar'e Stoudemire","Yao Ming*","Dwight Howard","Steve Nash","Gilbert Arenas","Dwyane Wade","Kobe Bryant","Tracy McGrady","Chauncey Billups")

quinteto08 <- c("Kevin Garnett","Dirk Nowitzki","Carlos Boozer","LeBron James","Tim Duncan" ,"Paul Pierce", "Dwight Howard","Amar'e Stoudemire","Yao Ming*","Kobe Bryant","Steve Nash","Tracy McGrady","Chris Paul","Deron Williams","Manu Ginobili")

quinteto09 <- c("LeBron James","Tim Duncan","Carmelo Anthony","Dirk Nowitzki","Paul Pierce","Pau Gasol","Dwight Howard","Yao Ming*","Shaquille O'Neal*","Kobe Bryant","Brandon Roy","Chauncey Billups", "Dwyane Wade","Chris Paul","Tony Parker")

quinteto10 <- c("LeBron James","Dirk Nowitzki","Brandon Roy","Kevin Durant","Steve Nash","Pau Gasol","Dwight Howard","Amar'e Stoudemire","Andrew Bogut","Kobe Bryant","Carmelo Anthony","Tim Duncan", "Dwyane Wade","Deron Williams","Joe Johnson")

quinteto11 <- c("LeBron James","Dirk Nowitzki","LaMarcus Aldridge","Kevin Durant","Amar'e Stoudemire","Zach Randolph","Dwight Howard","Pau Gasol","Al Horford", "Kobe Bryant","Dwyane Wade","Manu Ginobili","Derrick Rose","Russell Westbrook","Chris Paul")

quinteto12 <- c("LeBron James","Kevin Love","Carmelo Anthony","Kevin Durant","Blake Griffin","Dirk Nowitzki","Dwight Howard","Andrew Bynum","Tyson Chandler","Kobe Bryant","Tony Parker","Dwyane Wade","Chris Paul","Russell Westbrook","Rajon Rondo")

quinteto13 <- c("LeBron James","Carmelo Anthony","Paul George","Kevin Durant","Blake Griffin","David Lee", "Tim Duncan","Marc Gasol","Dwight Howard","Kobe Bryant","Tony Parker","Dwyane Wade","Chris Paul","Russell Westbrook","James Harden")

quinteto14 <- c("Kevin Durant","Blake Griffin","Paul George","LeBron James","Kevin Love","LaMarcus Aldridge","Joakim Noah","Dwight Howard","Al Jefferson","James Harden","Stephen Curry","Goran Dragic","Chris Paul","Tony Parker","Damian Lillard")

quinteto15 <- c("LeBron James","LaMarcus Aldridge","Blake Griffin","Anthony Davis","DeMarcus Cousins","Tim Duncan","Marc Gasol","Pau Gasol","DeAndre Jordan","James Harden","Russell Westbrook","Klay Thompson","Stephen Curry","Chris Paul","Kyrie Irving")

quinteto16 <- c("Kawhi Leonard","Kevin Durant","Paul George","LeBron James","Draymond Green","LaMarcus Aldridge","DeAndre Jordan","DeMarcus Cousins","Andre Drummond","Stephen Curry","Damian Lillard","Klay Thompson","Russell Westbrook","Chris Paul","Kyle Lowry")

quinteto17 <- c("Kawhi Leonard","Kevin Durant","Jimmy Butler","LeBron James","Giannis Antetokounmpo","Draymond Green","Anthony Davis","Rudy Gobert","DeAndre Jordan","James Harden","Stephen Curry","John Wall","Russell Westbrook","Isaiah Thomas","DeMar DeRozan")
quinteto20 <-c('Guillem Miralles','Miguel Payá')
  • Introduce a variable in the comic called a quintet, which tells us whether or not the player has been in the ALL NBA TEAM of that season.
#With this function what we do is introduce the all nba team of each year and the year to which it corresponds. Our function what it does is return a data.frame with the new column (quinteto) indicating 1 if the player is in the quintet, and 0 if it is not. Then we will join all the dat.frame in order to get the data for all the seasons.

funcion_quinteto <- function(quinteto_df,any){
df <- Seasons_Stats2 %>% subset(Year == any)
quinteto <- 0
for (i in (1:length(df$Player))) {
if (df$Player[i] %in% quinteto_df){
  quinteto[i] = 1
}else{
  quinteto[i] = 0
}
}
df_any<- data.frame(quinteto,df)
return(df_any)
}
#We apply the function for each year:
df_any80 <- funcion_quinteto(quinteto80,1980)
df_any81 <- funcion_quinteto(quinteto81,1981)
df_any82 <- funcion_quinteto(quinteto82,1982)
df_any83 <- funcion_quinteto(quinteto83,1983)
df_any84 <- funcion_quinteto(quinteto84,1984)
df_any85 <- funcion_quinteto(quinteto85,1985)
df_any86 <- funcion_quinteto(quinteto86,1986)
df_any87 <- funcion_quinteto(quinteto87,1987)
df_any88 <- funcion_quinteto(quinteto88,1988)
df_any89 <- funcion_quinteto(quinteto89,1989)
df_any90 <- funcion_quinteto(quinteto90,1990)
df_any91 <- funcion_quinteto(quinteto91,1991)
df_any92 <- funcion_quinteto(quinteto92,1992)
df_any93 <- funcion_quinteto(quinteto93,1993)
df_any94 <- funcion_quinteto(quinteto94,1994)
df_any95 <- funcion_quinteto(quinteto95,1995)
df_any96 <- funcion_quinteto(quinteto96,1996)
df_any97 <- funcion_quinteto(quinteto97,1997)
df_any98 <- funcion_quinteto(quinteto98,1998)
df_any99 <- funcion_quinteto(quinteto99,1999)
df_any00 <- funcion_quinteto(quinteto00,2000)
df_any01 <- funcion_quinteto(quinteto01,2001)
df_any02 <- funcion_quinteto(quinteto02,2002)
df_any03 <- funcion_quinteto(quinteto03,2003)
df_any04 <- funcion_quinteto(quinteto04,2004)
df_any05 <- funcion_quinteto(quinteto05,2005)
df_any06 <- funcion_quinteto(quinteto06,2006)
df_any07 <- funcion_quinteto(quinteto07,2007)
df_any08 <- funcion_quinteto(quinteto08,2008)
df_any09 <- funcion_quinteto(quinteto09,2009)
df_any10 <- funcion_quinteto(quinteto10,2010)
df_any11 <- funcion_quinteto(quinteto11,2011)
df_any12 <- funcion_quinteto(quinteto12,2012)
df_any13 <- funcion_quinteto(quinteto13,2013)
df_any14 <- funcion_quinteto(quinteto14,2014)
df_any15 <- funcion_quinteto(quinteto15,2015)
df_any16 <- funcion_quinteto(quinteto16,2016)
df_any17 <- funcion_quinteto(quinteto17,2017)
#We unite in order to have all the seasons:

#We set up a training set and a test set. The training set will be stored until 2011, while the test set will be used from 2012 to 2017.
bd <- rbind(df_any80,df_any81,df_any82,df_any83,df_any84,df_any85,df_any86,df_any87,df_any88,df_any89,
            df_any90,df_any91,df_any92,df_any93,df_any94,df_any95,df_any96,df_any97,df_any98,df_any99,
            df_any00,df_any01,df_any02,df_any03,df_any04,df_any05,df_any06,df_any07,df_any08,df_any09,
            df_any10,df_any11)
bdpredict<-rbind(df_any12,df_any13,df_any14,df_any15,df_any16,df_any17)

#We remove the NULL values and na's
bd[is.na(bd)] = 0
bdpredict[is.na(bdpredict)]= 0
#In the data we have a problem. There are players who are in the all nba team, who in the middle of the season have been transferred to another team. So they have stats from 2 different teams. The database already incorporates the sum of these two statistics, so we have 3 rows with the player's statistics. We are only interested in the total data for the season. The players in this situation are: Dominique Wiklins (1994), Clyde Draxler (1995), Dikembe Mutombo (2001) and Chauncey Billups (2009). We eliminate the two rows that do not interest us in each player (those of the teams).

bd <- bd[-c(6069,6070,6203,6204,9568,9569,13639,13640),]
df_any17 <-df_any17[-c(112,113),] 
  • We visualize the data and observe that many of the variables correlate with each other or do not provide us with relevant information. Therefore, having so many predictive variables, want to regularize them.
#We begin the visualization of the variables. Already at first glance if we look at the variables, there are many that can be correlated as they explain the same thing. To see it more clearly, let's look at some of them:

attach(bd)
pairs( X3P ~ X3PA + X3P.)

pairs(X2P ~ X2PA + X2P.)

pairs(WS ~ OWS + DWS + WS.48)

pairs( PTS ~ FG + FT)

pairs(BPM ~ DBPM + OBPM)

pairs(FGA ~ X3PA + X2PA)

#We can observe that many of the variables present us with information that is not entirely relevant. To know which ones we are going to use, we are going to use reduction techniques that help us find the best variables for our model.
  • We have observed that there are many correlated variables. Therefore we are going to realize the regularization of variables, with the purpose of to reduce the variance of the same. We use the Lasso method and check if the results obtained are adjusted.

1.3.2.1 Dimensionality Reduction with LASSO

library(glmnet)
x <- model.matrix(quinteto~ Pos+Age+G+MP+PER+TS.+X3PAr+FTr+ORB.+TRB.+AST.+STL.+BLK.+TOV.+USG.+OWS+DWS+WS+WS.48+OBPM+DBPM+BPM+VORP+FG+FGA+FG.+X3P+X3PA+X3P.+X2P+X2PA+X2P.+eFG.+FT+FTA+FT.+PF+DRB.+ORB+TRB+DRB+AST+STL+BLK+TOV+PTS,bd)[,-1] 
y <- bd$quinteto


lambdas <- 10^seq(5,-5,length=100)
set.seed(12345)
cv.lasso.NBA <- cv.glmnet(x,y,alpha=1,lambda=lambdas)
plot(cv.lasso.NBA)

cv.lasso.NBA$lambda.1se
[1] 0.001047616
lasso.final <- glmnet(x,y,alpha=1,lambda =round(cv.lasso.NBA$lambda.1se,3))
coef(lasso.final)[coef(lasso.final)[,1] !=0,]
  (Intercept)         PosPF         PosSF           Age             G            MP           TS. 
 2.222875e-02 -1.436547e-03 -3.706560e-03  3.249107e-04  7.727267e-04 -1.716349e-04 -4.249438e-02 
          FTr          STL.          TOV.          USG.           DWS            WS           BPM 
-1.607393e-02 -7.022702e-04  2.898649e-04 -2.537874e-04  6.191524e-03  1.821565e-02 -4.986440e-04 
         VORP            FG           FGA           X2P           FTA            PF           ORB 
 3.161288e-02  2.369019e-05  1.295914e-04  6.016663e-05  3.683313e-04 -3.212804e-04 -1.507431e-04 
          DRB           AST           STL           BLK 
 1.636141e-04  1.907179e-04 -9.944877e-05  3.400609e-04 
#   We observe that these variables are the ones that the Lasso method indicates to us that they are more explanatory, since they have different coefficients from 0.
  • We see how the variables we are interested in are greatly reduced. As we are performing a logistic regression, in the variables we obtained from the previous point, we perform three models using three different methods which are the ones we will compare. These three methods are: Multiple Logistic Regression (GLM), Quadratic Discriminant Analysis (QDA), and Linear Discriminant Analysis (LDA). We do not take the KNN method because we already know that neighboring values are not interesting for predicting the next value.

  • We make comparisons between the models and look at the following results to choose the one that interests us most.

glmnba <- glm(quinteto ~ Pos+ Age + G + MP + TS. + FTr + STL. + TOV. + USG. +DWS + WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB + AST + STL + BLK, data = bd, family = 'binomial')
summary(glmnba)

Call:
glm(formula = quinteto ~ Pos + Age + G + MP + TS. + FTr + STL. + 
    TOV. + USG. + DWS + WS + BPM + VORP + FG + FGA + X2P + FTA + 
    PF + ORB + DRB + AST + STL + BLK, family = "binomial", data = bd)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.5398  -0.0255  -0.0097  -0.0032   3.1858  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -9.895e+00  1.979e+00  -4.999 5.76e-07 ***
PosC-PF     -8.805e+00  1.146e+03  -0.008 0.993867    
PosC-SF     -5.389e+00  6.523e+03  -0.001 0.999341    
PosPF       -2.028e-01  3.279e-01  -0.618 0.536389    
PosPF-C     -1.042e+01  1.187e+03  -0.009 0.992995    
PosPF-SF    -8.776e+00  1.345e+03  -0.007 0.994794    
PosPG        4.628e-01  6.661e-01   0.695 0.487173    
PosPG-SF    -1.081e+01  6.523e+03  -0.002 0.998678    
PosPG-SG    -1.095e+01  1.138e+03  -0.010 0.992323    
PosSF       -6.682e-02  4.532e-01  -0.147 0.882787    
PosSF-PF    -1.016e+01  1.376e+03  -0.007 0.994111    
PosSF-SG    -1.421e+01  9.751e+02  -0.015 0.988372    
PosSG        8.679e-01  5.495e-01   1.579 0.114265    
PosSG-PF    -1.334e+01  3.341e+03  -0.004 0.996815    
PosSG-PG    -1.414e+01  1.172e+03  -0.012 0.990373    
PosSG-SF    -9.913e+00  1.070e+03  -0.009 0.992606    
Age          5.493e-02  2.668e-02   2.059 0.039498 *  
G           -1.427e-01  2.310e-02  -6.176 6.59e-10 ***
MP          -4.875e-05  6.237e-04  -0.078 0.937701    
TS.         -3.015e+00  2.827e+00  -1.066 0.286216    
FTr          9.998e-01  6.818e-01   1.466 0.142559    
STL.        -2.864e-01  2.604e-01  -1.100 0.271476    
TOV.         6.865e-02  2.689e-02   2.553 0.010692 *  
USG.         1.094e-01  2.931e-02   3.732 0.000190 ***
DWS          4.977e-01  1.207e-01   4.123 3.75e-05 ***
WS           8.476e-01  9.517e-02   8.907  < 2e-16 ***
BPM          1.696e-01  4.532e-02   3.742 0.000183 ***
VORP        -6.077e-01  1.610e-01  -3.776 0.000160 ***
FG           1.116e-02  6.547e-03   1.704 0.088382 .  
FGA          1.356e-03  2.131e-03   0.636 0.524552    
X2P         -4.430e-03  3.174e-03  -1.396 0.162822    
FTA          3.216e-04  1.220e-03   0.264 0.792056    
PF          -6.412e-03  2.603e-03  -2.463 0.013772 *  
ORB          5.040e-03  2.085e-03   2.417 0.015657 *  
DRB          1.462e-03  1.309e-03   1.117 0.263990    
AST          5.366e-03  1.125e-03   4.768 1.86e-06 ***
STL          2.717e-03  5.768e-03   0.471 0.637620    
BLK          8.365e-03  2.610e-03   3.204 0.001354 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3944.24  on 15359  degrees of freedom
Residual deviance:  894.74  on 15322  degrees of freedom
AIC: 970.74

Number of Fisher Scoring iterations: 17
step(glmnba)
Start:  AIC=970.74
quinteto ~ Pos + Age + G + MP + TS. + FTr + STL. + TOV. + USG. + 
    DWS + WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + 
    DRB + AST + STL + BLK

       Df Deviance     AIC
- Pos  15   906.56  952.56
- MP    1   894.75  968.75
- FTA   1   894.81  968.81
- STL   1   894.98  968.98
- FGA   1   895.14  969.14
- FTr   1   895.86  969.86
- TS.   1   895.88  969.88
- DRB   1   895.99  969.99
- X2P   1   896.70  970.70
<none>      894.74  970.74
- STL.  1   896.93  970.93
- FG    1   897.70  971.70
- Age   1   898.98  972.98
- TOV.  1   899.58  973.58
- ORB   1   900.60  974.60
- PF    1   900.85  974.85
- USG.  1   903.61  977.61
- BLK   1   905.05  979.05
- BPM   1   907.97  981.97
- VORP  1   909.07  983.07
- DWS   1   912.32  986.32
- AST   1   919.51  993.51
- G     1   937.49 1011.49
- WS    1   981.81 1055.81

Step:  AIC=952.56
quinteto ~ Age + G + MP + TS. + FTr + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB + 
    AST + STL + BLK

       Df Deviance     AIC
- MP    1   906.58  950.58
- DRB   1   906.67  950.67
- FTA   1   906.85  950.85
- STL   1   906.92  950.92
- FGA   1   907.23  951.23
- TS.   1   907.50  951.50
- FTr   1   907.54  951.54
<none>      906.56  952.56
- STL.  1   908.71  952.71
- X2P   1   909.40  953.40
- FG    1   909.43  953.43
- Age   1   910.25  954.25
- TOV.  1   911.54  955.54
- ORB   1   912.01  956.01
- PF    1   913.32  957.32
- USG.  1   915.63  959.63
- BLK   1   919.05  963.05
- BPM   1   920.01  964.01
- VORP  1   921.76  965.76
- DWS   1   924.33  968.33
- G     1   945.96  989.96
- AST   1   951.25  995.25
- WS    1   998.09 1042.09

Step:  AIC=950.58
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS + 
    BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB + AST + 
    STL + BLK

       Df Deviance     AIC
- DRB   1   906.67  948.67
- FTA   1   906.86  948.86
- STL   1   906.93  948.93
- FGA   1   907.23  949.23
- TS.   1   907.54  949.54
- FTr   1   907.57  949.57
<none>      906.58  950.58
- STL.  1   908.75  950.75
- X2P   1   909.40  951.40
- FG    1   909.46  951.46
- Age   1   910.56  952.56
- TOV.  1   911.56  953.56
- ORB   1   912.04  954.04
- PF    1   913.32  955.32
- USG.  1   917.02  959.02
- BLK   1   919.10  961.10
- BPM   1   920.75  962.75
- VORP  1   922.01  964.01
- DWS   1   924.42  966.42
- AST   1   952.95  994.95
- G     1   961.91 1003.91
- WS    1   998.11 1040.11

Step:  AIC=948.67
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS + 
    BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + AST + STL + 
    BLK

       Df Deviance     AIC
- STL   1   906.96  946.96
- FTA   1   906.97  946.97
- FGA   1   907.36  947.36
- TS.   1   907.60  947.60
- FTr   1   907.64  947.64
<none>      906.67  948.67
- STL.  1   908.85  948.85
- FG    1   909.52  949.52
- X2P   1   909.54  949.54
- Age   1   910.73  950.73
- TOV.  1   911.68  951.68
- PF    1   913.32  953.32
- ORB   1   914.50  954.50
- USG.  1   917.02  957.02
- BLK   1   919.29  959.29
- BPM   1   920.76  960.76
- VORP  1   922.05  962.05
- DWS   1   929.97  969.97
- AST   1   953.15  993.15
- G     1   962.20 1002.20
- WS    1   998.11 1038.11

Step:  AIC=946.96
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS + 
    BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + AST + BLK

       Df Deviance     AIC
- FTA   1   907.29  945.29
- FGA   1   907.67  945.67
- FTr   1   907.91  945.91
- TS.   1   907.92  945.92
<none>      906.96  946.96
- STL.  1   909.15  947.15
- X2P   1   909.82  947.82
- FG    1   909.84  947.84
- Age   1   910.82  948.82
- TOV.  1   911.71  949.71
- PF    1   913.44  951.44
- ORB   1   914.76  952.76
- USG.  1   917.32  955.32
- BLK   1   919.29  957.29
- BPM   1   921.09  959.09
- VORP  1   923.15  961.15
- DWS   1   932.44  970.44
- AST   1   961.81  999.81
- G     1   962.27 1000.27
- WS    1  1000.34 1038.34

Step:  AIC=945.29
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS + 
    BPM + VORP + FG + FGA + X2P + PF + ORB + AST + BLK

       Df Deviance     AIC
- TS.   1   908.16  944.16
- FGA   1   908.80  944.80
<none>      907.29  945.29
- X2P   1   909.83  945.83
- FG    1   909.84  945.84
- FTr   1   909.91  945.91
- STL.  1   910.05  946.05
- Age   1   910.95  946.95
- PF    1   913.47  949.47
- TOV.  1   913.55  949.55
- ORB   1   915.15  951.15
- BLK   1   919.76  955.76
- USG.  1   920.37  956.37
- BPM   1   923.21  959.21
- VORP  1   924.03  960.03
- DWS   1   933.02  969.02
- AST   1   962.37  998.37
- G     1   962.46  998.46
- WS    1  1034.66 1070.66

Step:  AIC=944.16
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM + 
    VORP + FG + FGA + X2P + PF + ORB + AST + BLK

       Df Deviance     AIC
- FG    1   909.89  943.89
- X2P   1   910.00  944.00
- FTr   1   910.01  944.01
<none>      908.16  944.16
- STL.  1   911.01  945.01
- Age   1   911.88  945.88
- FGA   1   912.83  946.83
- PF    1   914.58  948.58
- TOV.  1   915.08  949.08
- ORB   1   915.96  949.96
- USG.  1   920.37  954.37
- BLK   1   920.76  954.76
- BPM   1   924.04  958.04
- VORP  1   924.14  958.14
- DWS   1   934.91  968.91
- AST   1   962.39  996.39
- G     1   968.88 1002.88
- WS    1  1035.67 1069.67

Step:  AIC=943.89
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM + 
    VORP + FGA + X2P + PF + ORB + AST + BLK

       Df Deviance     AIC
- X2P   1   910.06  942.06
- FTr   1   911.09  943.09
<none>      909.89  943.89
- STL.  1   913.28  945.28
- Age   1   913.51  945.51
- PF    1   915.75  947.75
- ORB   1   916.18  948.18
- TOV.  1   918.77  950.77
- USG.  1   921.67  953.67
- BLK   1   922.58  954.58
- VORP  1   924.41  956.41
- BPM   1   926.71  958.71
- DWS   1   936.59  968.59
- FGA   1   951.18  983.18
- AST   1   965.50  997.50
- G     1   968.88 1000.88
- WS    1  1073.43 1105.43

Step:  AIC=942.06
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM + 
    VORP + FGA + PF + ORB + AST + BLK

       Df Deviance     AIC
- FTr   1   911.26  941.26
<none>      910.06  942.06
- STL.  1   913.44  943.44
- Age   1   913.67  943.67
- PF    1   916.46  946.46
- ORB   1   916.60  946.60
- TOV.  1   918.83  948.83
- USG.  1   921.99  951.99
- BLK   1   922.67  952.67
- VORP  1   925.05  955.05
- BPM   1   926.77  956.77
- DWS   1   937.24  967.24
- AST   1   966.11  996.11
- G     1   969.76  999.76
- FGA   1   984.38 1014.38
- WS    1  1088.96 1118.96

Step:  AIC=941.26
quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM + VORP + 
    FGA + PF + ORB + AST + BLK

       Df Deviance     AIC
<none>      911.26  941.26
- Age   1   914.56  942.56
- STL.  1   915.38  943.38
- PF    1   917.01  945.01
- ORB   1   918.41  946.41
- TOV.  1   920.89  948.89
- BLK   1   924.19  952.19
- USG.  1   925.62  953.62
- VORP  1   927.48  955.48
- BPM   1   929.51  957.51
- DWS   1   937.73  965.73
- AST   1   968.08  996.08
- G     1   971.50  999.50
- FGA   1   984.52 1012.52
- WS    1  1107.48 1135.48

Call:  glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd)

Coefficients:
(Intercept)          Age            G         STL.         TOV.         USG.          DWS  
 -10.818575     0.045524    -0.133487    -0.220845     0.083679     0.104698     0.447596  
         WS          BPM         VORP          FGA           PF          ORB          AST  
   0.885259     0.166501    -0.498484     0.004465    -0.005797     0.004185     0.004953  
        BLK  
   0.008042  

Degrees of Freedom: 15359 Total (i.e. Null);  15345 Residual
Null Deviance:      3944 
Residual Deviance: 911.3    AIC: 941.3

1.3.3 GLM

glmnba.final <-  glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd)
summary(glmnba.final)

Call:
glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4771  -0.0263  -0.0104  -0.0038   3.1025  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.082e+01  1.289e+00  -8.390  < 2e-16 ***
Age          4.552e-02  2.504e-02   1.818 0.069083 .  
G           -1.335e-01  1.743e-02  -7.658 1.89e-14 ***
STL.        -2.208e-01  1.340e-01  -1.648 0.099253 .  
TOV.         8.368e-02  2.052e-02   4.078 4.54e-05 ***
USG.         1.047e-01  2.299e-02   4.554 5.27e-06 ***
DWS          4.476e-01  8.935e-02   5.009 5.46e-07 ***
WS           8.853e-01  7.298e-02  12.130  < 2e-16 ***
BPM          1.665e-01  4.224e-02   3.941 8.10e-05 ***
VORP        -4.985e-01  1.253e-01  -3.978 6.96e-05 ***
FGA          4.465e-03  5.135e-04   8.695  < 2e-16 ***
PF          -5.797e-03  2.426e-03  -2.390 0.016856 *  
ORB          4.185e-03  1.564e-03   2.676 0.007447 ** 
AST          4.953e-03  6.977e-04   7.099 1.25e-12 ***
BLK          8.042e-03  2.247e-03   3.578 0.000346 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3944.24  on 15359  degrees of freedom
Residual deviance:  911.26  on 15345  degrees of freedom
AIC: 941.26

Number of Fisher Scoring iterations: 10
nba.prob <- predict(glmnba.final,bdpredict,type="response")

nba.pred <- rep("0_NotInAllNba",length(bdpredict$Player))
nba.pred[nba.prob > 0.5] <- '1'
(mean(nba.prob - bdpredict$quinteto)^2)
[1] 1.420207e-05
table(nba.pred,bdpredict$quinteto)
               
nba.pred           0    1
  0_NotInAllNba 3462   37
  1                7   53
(3462 + 53 )/3559
[1] 0.987637
library(pROC)
pred2 <- predict(glmnba.final, type="response")
curv_roc4 <- roc(bd$quinteto,pred2)
plot(curv_roc4)

curv_roc4$auc
Area under the curve: 0.9946
ci(curv_roc4)
95% CI: 0.9934-0.9958 (DeLong)

1.3.4 QDA

library(MASS)
qdanba <- qda(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, 
    data = bd)
qdanba
Call:
qda(quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM + 
    VORP + FGA + PF + ORB + AST + BLK, data = bd)

Prior probabilities of groups:
        0         1 
0.9718099 0.0281901 

Group means:
       Age        G     STL.     TOV.     USG.      DWS        WS       BPM      VORP       FGA
0 26.88725 49.59088 1.653038 15.28385 18.71998 1.120620  2.189643 -2.635975 0.4346754  383.9318
1 27.33487 76.13626 1.972979 12.90139 26.87529 4.101848 11.593533  4.939261 5.0092379 1300.2032
        PF       ORB      AST      BLK
0 110.3052  60.32304 106.1581 23.16293
1 205.2540 163.26097 377.9677 83.81986
qda.pred <- predict(qdanba, bdpredict)
qda.class <- qda.pred$class
table(qda.class,bdpredict$quinteto)
         
qda.class    0    1
        0 3296    4
        1  173   86
mean(qda.class == bdpredict$quinteto)
[1] 0.9502669
c.roc <- roc(bdpredict$quinteto, qda.pred$posterior[,2])
plot(c.roc)

c.roc$auc
Area under the curve: 0.9881
ci(c.roc)
95% CI: 0.9841-0.992 (DeLong)

1.3.5 LDA:

ldanba <- lda(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, 
    data = bd)
ldanba
Call:
lda(quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM + 
    VORP + FGA + PF + ORB + AST + BLK, data = bd)

Prior probabilities of groups:
        0         1 
0.9718099 0.0281901 

Group means:
       Age        G     STL.     TOV.     USG.      DWS        WS       BPM      VORP       FGA
0 26.88725 49.59088 1.653038 15.28385 18.71998 1.120620  2.189643 -2.635975 0.4346754  383.9318
1 27.33487 76.13626 1.972979 12.90139 26.87529 4.101848 11.593533  4.939261 5.0092379 1300.2032
        PF       ORB      AST      BLK
0 110.3052  60.32304 106.1581 23.16293
1 205.2540 163.26097 377.9677 83.81986

Coefficients of linear discriminants:
               LD1
Age  -0.0017043161
G    -0.0016823175
STL. -0.0293856482
TOV.  0.0148408460
USG.  0.0280796224
DWS  -0.0187349395
WS    0.2654048167
BPM  -0.0110578645
VORP  0.4231109491
FGA   0.0006739763
PF   -0.0086617870
ORB  -0.0007880899
AST   0.0000610087
BLK   0.0073037155
lda.pred <- predict(ldanba, bdpredict)
lda.class <- lda.pred$class
table(lda.class,bdpredict$quinteto)
         
lda.class    0    1
        0 3422   21
        1   47   69
mean(lda.class == bdpredict$quinteto)
[1] 0.9808935
c2.roc <- roc(bdpredict$quinteto, lda.pred$posterior[,2])
plot(c2.roc)

c2.roc$auc
Area under the curve: 0.9901
ci(c2.roc)
95% CI: 0.9862-0.994 (DeLong)

We choose the GLM method as it is the one that best predicts true positives and negatives. On the one hand it is the one that reduces the false positives the most (really what interests us to the mistakes that the model makes), but the false negatives are higher than the other models. We can say that all three models are good, but for the above reasons we will stick with the GLM.

bd15 <- rbind(bd,df_any12,df_any13,df_any14)

bd16 <- rbind(bd15,df_any15)
bd17 <- rbind(bd16,df_any16)
glmnba2015 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd15)

nba.prob2015 <- predict(glmnba2015,df_any15,type="response")

nba.pred2015 <- rep("0_NotInAllNba",length(df_any15$Player))
nba.pred2015[nba.prob2015 > 0.5] <- '1'
nba2015 <- sort(nba.prob2015, decreasing=TRUE)

prueba2015 <- nba2015[1:40]

probs <- (prueba2015)*100
nuevodata2015 <- data.frame(probs,df_any15[names(prueba2015),1:5])
glmnba2016 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd16)

nba.prob2016 <- predict(glmnba2016,df_any16,type="response")

nba.pred2016 <- rep("0_NotInAllNba",length(df_any16$Player))
nba.pred2016[nba.prob2016 > 0.5] <- '1'
nba2016 <- sort(nba.prob2016, decreasing=TRUE)

prueba2016 <- nba2016[1:40]
probs <- (prueba2016)*100
nuevodata2016 <- data.frame(probs,df_any16[names(prueba2016),1:5])
glmnba2017 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS + 
    WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial", 
    data = bd17)

nba.prob2017 <- predict(glmnba2017,df_any17,type="response")
nba.pred2017 <- rep("0_NotInAllNba",length(df_any17$Player))
nba.pred2017[nba.prob2017 > 0.5] <- '1'
nba2017 <- sort(nba.prob2017, decreasing=TRUE)

prueba2017 <- nba2017[1:40]
probs <- (prueba2017)*100
nuevodata2017 <- data.frame(probs,df_any17[names(prueba2017),1:5])
funcion_posicions_2015 <- function(df){

contG = 0
contF = 0
contC = 0
conttotal = 0
playerselection <- c()
for (i in (1:length(df$Player))) {
if (contG < 6 & (df$Pos[i] == 'PG' | df$Pos[i] == 'SG')){
   contG = contG + 1
   conttotal = conttotal + 1
   playerselection[conttotal] <- df$Player[i]
   print(playerselection)
  
}
else if (contF < 5 & (df$Pos[i] == 'SF' | df$Pos[i] == 'PF')){
  contF = contF +1
  conttotal = conttotal +1
  playerselection[conttotal] <- df$Player[i]
  print(playerselection)
}
else if (contC < 4 & df$Pos[i] == 'C'){
  contC = contC + 1
  conttotal = conttotal + 1
  playerselection[conttotal] <- df$Player[i]
  print(playerselection)

}}
print(playerselection)
df<- df %>% filter(df$Player %in% playerselection)

return(df)
}

We create the dataframe with the predictions of the model, and if the result is correct or not. We have to create a function that chooses (from the players with the highest probability of the model) the positions in the All NBATeam.

funcion_posicions <- function(df){

contG = 0
contF = 0
contC = 0
conttotal = 0
playerselection <- c()
for (i in (1:length(df$Player))) {
if (contG < 6 & (df$Pos[i] == 'PG' | df$Pos[i] == 'SG')){
   contG = contG + 1
   conttotal = conttotal + 1
   playerselection[conttotal] <- df$Player[i]
   print(playerselection)
  
}
else if (contF < 6 & (df$Pos[i] == 'SF' | df$Pos[i] == 'PF')){
  contF = contF +1
  conttotal = conttotal +1
  playerselection[conttotal] <- df$Player[i]
  print(playerselection)
}
else if (contC < 3 & df$Pos[i] == 'C'){
  contC = contC + 1
  conttotal = conttotal + 1
  playerselection[conttotal] <- df$Player[i]
  print(playerselection)

}}
print(playerselection)
df<- df %>% filter(df$Player %in% playerselection)

return(df)
}

1.4 Final Visualizations:

1.4.1 YEAR 2015

tablaprediccio15 <- funcion_posicions_2015(nuevodata2015)
[1] "James Harden"
[1] "James Harden"  "Stephen Curry"
[1] "James Harden"  "Stephen Curry" "Chris Paul"   
[1] "James Harden"  "Stephen Curry" "Chris Paul"    "Anthony Davis"
[1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
[5] "Russell Westbrook"
[1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
[5] "Russell Westbrook" "LeBron James"     
[1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
[5] "Russell Westbrook" "LeBron James"      "Pau Gasol"        
[1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
[5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
[1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
[5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
[9] "Damian Lillard"   
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"        
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"    
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"     "Marc Gasol"       
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"     "Marc Gasol"       
[13] "Tim Duncan"       
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"     "Marc Gasol"       
[13] "Tim Duncan"        "DeMarcus Cousins" 
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"     "Marc Gasol"       
[13] "Tim Duncan"        "DeMarcus Cousins"  "DeAndre Jordan"   
 [1] "James Harden"      "Stephen Curry"     "Chris Paul"        "Anthony Davis"    
 [5] "Russell Westbrook" "LeBron James"      "Pau Gasol"         "LaMarcus Aldridge"
 [9] "Damian Lillard"    "John Wall"         "Blake Griffin"     "Marc Gasol"       
[13] "Tim Duncan"        "DeMarcus Cousins"  "DeAndre Jordan"   
tablaprediccio15$quinteto <- as.logical(tablaprediccio15$quinteto)
tablaresultats15 <- tablaprediccio15 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats15)= c("Player","Age","Position","Probabilitat","Is in?")


tablaresultats15 %>% gt()%>% tab_header(
    title = md("Best players 2015"))
Best players 2015
Player Age Position Probabilitat Is in?
James Harden 25 SG 99.82453 TRUE
Stephen Curry 26 PG 99.46583 TRUE
Chris Paul 29 PG 98.94948 TRUE
Anthony Davis 21 PF 98.68507 TRUE
Russell Westbrook 26 PG 97.67883 TRUE
LeBron James 30 SF 91.07932 TRUE
Pau Gasol 34 PF 79.18571 TRUE
LaMarcus Aldridge 29 PF 71.48565 TRUE
Damian Lillard 24 PG 49.70695 FALSE
John Wall 24 PG 39.83640 FALSE
Blake Griffin 25 PF 36.01446 TRUE
Marc Gasol 30 C 34.85991 TRUE
Tim Duncan 38 C 25.74312 TRUE
DeMarcus Cousins 24 C 21.20210 TRUE
DeAndre Jordan 26 C 17.97916 TRUE
no_corresponen2015 <- tablaprediccio15 %>% filter(quinteto %in% 0)
no_corresponen2015
       probs quinteto Year         Player Pos Age
351 49.70695    FALSE 2015 Damian Lillard  PG  24
608 39.83640    FALSE 2015      John Wall  PG  24
nuevodata2015 %>% filter(Pos %in% c("PG","SG"))
        probs quinteto Year            Player Pos Age
250 99.824535        1 2015      James Harden  SG  25
136 99.465827        1 2015     Stephen Curry  PG  26
448 98.949478        1 2015        Chris Paul  PG  29
616 97.678831        1 2015 Russell Westbrook  PG  26
351 49.706950        0 2015    Damian Lillard  PG  24
608 39.836400        0 2015         John Wall  PG  24
88  34.920491        0 2015      Jimmy Butler  SG  25
284 32.246938        1 2015      Kyrie Irving  PG  22
582 16.396576        1 2015     Klay Thompson  SG  24
567  7.883518        0 2015       Jeff Teague  PG  26
340  3.052430        0 2015         Ty Lawson  PG  27
357  2.954811        0 2015        Kyle Lowry  PG  28
602  1.958710        0 2015       Dwyane Wade  SG  33
267  1.836709        0 2015       George Hill  PG  28
62   1.768488        0 2015      Eric Bledsoe  PG  25
122  1.671994        0 2015       Mike Conley  PG  27
330  1.539479        0 2015    Brandon Knight  PG  23
sustituts2015 <- nuevodata2015 %>% filter(Player %in% c("Kyrie Irving","Klay Thompson"))
sustituts2015
       probs quinteto Year        Player Pos Age
284 32.24694        1 2015  Kyrie Irving  PG  22
582 16.39658        1 2015 Klay Thompson  SG  24
correccio2015 <- rbind(no_corresponen2015,sustituts2015)
correccioresultats <- correccio2015 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats$quinteto <- as.logical(correccioresultats$quinteto)
names(correccioresultats)= c("Player","Age","Position","Probabilitat","Is in?")

correccioresultats %>% gt()%>% tab_header(
    title = md("substitutions"))
substitutions
Player Age Position Probabilitat Is in?
Damian Lillard 24 PG 49.70695 FALSE
John Wall 24 PG 39.83640 FALSE
Kyrie Irving 22 PG 32.24694 TRUE
Klay Thompson 24 SG 16.39658 TRUE
plotly2015 <- ggplot(data = tablaprediccio15,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) + geom_bar(stat = "identity")+
  theme_bw() + theme(axis.text.x=element_text(angle=90))+labs(title = "Prediction",
     subtitle = "2015",
     x        = "Players",
     y        = "Probability") + coord_cartesian(ylim = c(10,100))

ggplotly(plotly2015)
plotly2015pos <- ggplot(data = tablaprediccio15,
                     mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
 
  geom_bar(stat = "identity") +
   scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) + 
  theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predicition",
     subtitle = "2015",
     x        = "Players",
     y        = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2015pos)

Checking the model in 2015, we appreciate that the results obtained seem very accurate. Whereas we have a database with many players every season, in this case 650, manages to predict 13 of the 15 players at the ALL NBA TEAM.

Knowing that voting is subjective depending on the player’s game, and not on his statistics, we note that our model explains these votes with a very high probability of success. In the table of substitutions, these are the players who should be in the quintet (Kyrie Irving and Klay Thompson) replacing those who have not been able to correctly predict our model (Jhon Wall and Damian Lillard). We also show the probabilities that our model gives to these players. That way we can learn a little more about our mistakes.

To make this boxplot we do it with a database of the 30 players who are most likely to be in the quintet. On the right we can see as the median of the players who are going to be in the quintet that year we give a probability of 71.49%, while we give the other players a median of 6.59%. We note that there are 3 outliers who are the players our model predicts will be. One of them does not enter due to the restriction of positions.

In the position graph we can see the division of positions that explains a little more the errors of the model, as John Wall and Damian Lillard (both errors) enter although they have a higher probability than other players in the model, would enter position in the last two places.

1.4.2 YEAR 2016

###########2016

tablaprediccio16 <- funcion_posicions(nuevodata2016)
[1] "Stephen Curry"
[1] "Stephen Curry"     "Russell Westbrook"
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"     
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[5] "James Harden"     
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[5] "James Harden"      "Chris Paul"       
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[5] "James Harden"      "Chris Paul"        "Kawhi Leonard"    
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
[1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
[5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
[9] "Damian Lillard"   
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge"
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside" 
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside"  "Draymond Green"   
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside"  "Draymond Green"   
[13] "Paul Millsap"     
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside"  "Draymond Green"   
[13] "Paul Millsap"      "Anthony Davis"    
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside"  "Draymond Green"   
[13] "Paul Millsap"      "Anthony Davis"     "DeMarcus Cousins" 
 [1] "Stephen Curry"     "Russell Westbrook" "Kevin Durant"      "LeBron James"     
 [5] "James Harden"      "Chris Paul"        "Kawhi Leonard"     "Kyle Lowry"       
 [9] "Damian Lillard"    "LaMarcus Aldridge" "Hassan Whiteside"  "Draymond Green"   
[13] "Paul Millsap"      "Anthony Davis"     "DeMarcus Cousins" 
tablaprediccio16$quinteto <- as.logical(tablaprediccio16$quinteto)
tablaresultats16 <- tablaprediccio16 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats16)= c("Player","Age","Position","Probability","Is in?")


tablaresultats16 %>% gt()%>% tab_header(
    title = md("Best players 2016"))
Best players 2016
Player Age Position Probability Is in?
Stephen Curry 27 PG 99.95947 TRUE
Russell Westbrook 27 PG 99.66814 TRUE
Kevin Durant 27 SF 99.59944 TRUE
LeBron James 31 SF 99.27701 TRUE
James Harden 26 SG 97.84001 FALSE
Chris Paul 30 PG 94.96401 TRUE
Kawhi Leonard 24 SF 92.42842 TRUE
Kyle Lowry 29 PG 61.28511 TRUE
Damian Lillard 25 PG 60.52303 TRUE
LaMarcus Aldridge 30 PF 51.54793 TRUE
Hassan Whiteside 26 C 49.35669 FALSE
Draymond Green 25 PF 48.86015 TRUE
Paul Millsap 30 PF 42.52445 FALSE
Anthony Davis 22 C 22.50300 FALSE
DeMarcus Cousins 25 C 22.24952 TRUE
no_corresponen2016 <- tablaprediccio16 %>% filter(quinteto %in% 0)
nuevodata2016 %>% filter(Pos %in% c("SG","PF","SF","C"))
        probs quinteto Year             Player Pos Age
138 99.599444        1 2016       Kevin Durant  SF  27
265 99.277015        1 2016       LeBron James  SF  31
206 97.840012        0 2016       James Harden  SG  26
323 92.428424        1 2016      Kawhi Leonard  SF  24
7   51.547933        1 2016  LaMarcus Aldridge  PF  30
555 49.356690        0 2016   Hassan Whiteside   C  26
189 48.860151        1 2016     Draymond Green  PF  25
127 43.630190        0 2016      DeMar DeRozan  SG  26
376 42.524450        0 2016       Paul Millsap  PF  30
176 35.686584        1 2016        Paul George  SF  25
119 22.502996        0 2016      Anthony Davis   C  22
107 22.249519        1 2016   DeMarcus Cousins   C  25
78  20.457942        0 2016       Jimmy Butler  SG  26
173 19.153645        0 2016          Pau Gasol   C  35
293 16.846014        1 2016     DeAndre Jordan   C  27
246 11.252690        0 2016         Al Horford   C  29
165  9.584321        0 2016    Jimmer Fredette  SG  26
220  7.914784        0 2016     Gordon Hayward  SF  25
23   7.542809        0 2016    Carmelo Anthony  SF  31
133  6.070247        1 2016     Andre Drummond   C  22
517  5.361564        1 2016      Klay Thompson  SG  25
332  4.959915        0 2016         Kevin Love  PF  27
153  4.136556        0 2016     Derrick Favors  PF  24
541  4.031622        0 2016        Dwyane Wade  SG  34
62   3.951863        0 2016         Chris Bosh  PF  31
524  3.781190        0 2016 Karl-Anthony Towns   C  20
198  3.604408        0 2016      Blake Griffin  PF  26
330  2.151448        0 2016        Brook Lopez   C  27
539  1.645186        0 2016     Nikola Vucevic   C  25
sustituts2016 <- nuevodata2016 %>% filter(Player %in% c("DeAndre Jordan","Andre Drummond","Klay Thompson","Paul George"))

correccio2016 <- rbind(no_corresponen2016,sustituts2016)
correccioresultats2016 <- correccio2016 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats2016$quinteto <- as.logical(correccioresultats2016$quinteto)
names(correccioresultats2016)= c("Player","Age","Position","Probability","Is in?")

correccioresultats2016 %>% gt()%>% tab_header(
    title = md("Substitutions"))
Substitutions
Player Age Position Probability Is in?
James Harden 26 SG 97.840012 FALSE
Hassan Whiteside 26 C 49.356690 FALSE
Paul Millsap 30 PF 42.524450 FALSE
Anthony Davis 22 C 22.502996 FALSE
Paul George 25 SF 35.686584 TRUE
DeAndre Jordan 27 C 16.846014 TRUE
Andre Drummond 22 C 6.070247 TRUE
Klay Thompson 25 SG 5.361564 TRUE
############## Graphs

plotly2016 <- ggplot(data = tablaprediccio16,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) +

  geom_bar(stat = "identity")+
  theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
     subtitle = "2016",
     x        = "Players",
     y        = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2016)
plotly2016pos <- ggplot(data = tablaprediccio16,
                     mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
 
  geom_bar(stat = "identity") +
   scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) + 
  theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
     subtitle = "2016",
     x        = "Players",
     y        = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2016pos)

This year is the year in which we find the most mistakes, especially focusing on the mistake of James Hardem who gives him a 97.8% probability of belonging to the quintets. Researching a bit about the player we realize that he belongs to the quintets from 2013 to 2019 (with the exception of this year) and since 2014 he always appears in the first quintet.

We note that this year is the year in which the player got the fewest wins (a difference of 14 compared to other years), the mistake is because our model does not consider them. We think that this lack of victories influenced the voting. Although his individual statistics were very prominent. This player would have entered the quintet according to the votes in that year in the NBA, but as we know, there is a restriction of positions, which caused him not to enter.

In the table we can find which players have to be in the quintet replacing the model errors.

This year we have the highest average of players who are not in the quintet, although we can see that the median probability of players who are in the quintet is twice that of those who are not. Therefore we can consider that we make a good prediction.

As we can see in this model, we already have 6 players in the “guard” position, this fact causes James Harden to not be able to enter this year’s ALL NBA TEAM.

1.4.3 YEAR 2017:

#############2017

tablaprediccio17 <- funcion_posicions(nuevodata2017)
[1] "James Harden"
[1] "James Harden"      "Russell Westbrook"
[1] "James Harden"      "Russell Westbrook" "LeBron James"     
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[5] "Kawhi Leonard"    
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[5] "Kawhi Leonard"     "Kevin Durant"     
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[5] "Kawhi Leonard"     "Kevin Durant"      "Jimmy Butler"     
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[5] "Kawhi Leonard"     "Kevin Durant"      "Jimmy Butler"      "Stephen Curry"    
[1] "James Harden"      "Russell Westbrook" "LeBron James"      "Anthony Davis"    
[5] "Kawhi Leonard"     "Kevin Durant"      "Jimmy Butler"      "Stephen Curry"    
[9] "Isaiah Thomas"    
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo"
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"   
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"    "John Wall"            
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"    "John Wall"            
[13] "Rudy Gobert"          
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"    "John Wall"            
[13] "Rudy Gobert"           "DeMar DeRozan"        
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"    "John Wall"            
[13] "Rudy Gobert"           "DeMar DeRozan"         "Gordon Hayward"       
 [1] "James Harden"          "Russell Westbrook"     "LeBron James"          "Anthony Davis"        
 [5] "Kawhi Leonard"         "Kevin Durant"          "Jimmy Butler"          "Stephen Curry"        
 [9] "Isaiah Thomas"         "Giannis Antetokounmpo" "Karl-Anthony Towns"    "John Wall"            
[13] "Rudy Gobert"           "DeMar DeRozan"         "Gordon Hayward"       
tablaprediccio17$quinteto <- as.logical(tablaprediccio17$quinteto)
tablaresultats17 <- tablaprediccio17 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats17)= c("Player","Age","Position","Probability","Is in?")


tablaresultats17 %>% gt()%>% tab_header(
    title = md("Best layers 2017"))
Best layers 2017
Player Age Position Probability Is in?
James Harden 27 PG 99.90859 TRUE
Russell Westbrook 28 PG 99.90621 TRUE
LeBron James 32 SF 98.82146 TRUE
Anthony Davis 23 C 97.47270 TRUE
Kawhi Leonard 25 SF 97.37003 TRUE
Kevin Durant 28 SF 96.08908 TRUE
Jimmy Butler 27 SF 95.59811 TRUE
Stephen Curry 28 PG 95.06756 TRUE
Isaiah Thomas 27 PG 94.65219 TRUE
Giannis Antetokounmpo 22 SF 88.08650 TRUE
Karl-Anthony Towns 21 C 86.31123 FALSE
John Wall 26 PG 85.78017 TRUE
Rudy Gobert 24 C 75.74142 TRUE
DeMar DeRozan 27 SG 71.95088 TRUE
Gordon Hayward 26 SF 40.50799 FALSE
no_corresponen2017 <- tablaprediccio17 %>% filter(quinteto %in% 0)
no_corresponen2017
       probs quinteto Year             Player Pos Age
530 86.31123    FALSE 2017 Karl-Anthony Towns   C  21
226 40.50799    FALSE 2017     Gordon Hayward  SF  26
nuevodata2017 %>% filter(Pos %in% c("PF","SF","C"))
        probs quinteto Year                Player Pos Age
270 98.821462        1 2017          LeBron James  SF  32
123 97.472699        1 2017         Anthony Davis   C  23
319 97.370031        1 2017         Kawhi Leonard  SF  25
144 96.089081        1 2017          Kevin Durant  SF  28
80  95.598108        1 2017          Jimmy Butler  SF  27
20  88.086502        1 2017 Giannis Antetokounmpo  SF  22
530 86.311228        0 2017    Karl-Anthony Towns   C  21
190 75.741423        1 2017           Rudy Gobert   C  24
562 47.777896        0 2017      Hassan Whiteside   C  27
111 47.182194        0 2017      DeMarcus Cousins   C  26
226 40.507995        0 2017        Gordon Hayward  SF  26
207 16.290151        0 2017         Blake Griffin  PF  27
178 13.414475        0 2017            Marc Gasol   C  32
294 10.635179        1 2017        DeAndre Jordan   C  28
285  8.761690        0 2017          Nikola Jokic   C  21
9    7.964123        0 2017     LaMarcus Aldridge  PF  31
183  5.072372        0 2017           Paul George  SF  26
250  4.244377        0 2017         Dwight Howard   C  31
203  4.138469        1 2017        Draymond Green  PF  26
333  4.036093        0 2017            Kevin Love  PF  28
372  3.296743        0 2017          Paul Millsap  PF  31
149  1.640858        0 2017           Joel Embiid   C  22
sustituts2017 <- nuevodata2017 %>% filter(Player %in% c("DeAndre Jordan","Draymond Green"))

correccio2017 <- rbind(no_corresponen2017,sustituts2017)
correccioresultats2017 <- correccio2017 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats2017$quinteto <- as.logical(correccioresultats2017$quinteto)
names(correccioresultats2017)= c("Player","Age","Position","Probability","Is in?")

correccioresultats2017 %>% gt()%>% tab_header(
    title = md("Substitutions"))
Substitutions
Player Age Position Probability Is in?
Karl-Anthony Towns 21 C 86.311228 FALSE
Gordon Hayward 26 SF 40.507995 FALSE
DeAndre Jordan 28 C 10.635179 TRUE
Draymond Green 26 PF 4.138469 TRUE
############# Graphs
plotly2017 <- ggplot(data = tablaprediccio17,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) +

  geom_bar(stat = "identity")+
  theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
     subtitle = "2017",
     x        = "Player",
     y        = "Probability") + coord_cartesian(ylim = c(0,100))
ggplotly(plotly2017)
plotly2017pos <- ggplot(data = tablaprediccio17,
                     mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
 
  geom_bar(stat = "identity") +
   scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) + 
  theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Prediction",
     subtitle = "2017",
     x        = "Player",
     y        = "Probability") + coord_cartesian(ylim = c(0,100))
ggplotly(plotly2017pos)

This year we can consider that there are many players with a very high probability of belonging to the quintet. We note that there are very few errors. There are two mistakes, and they are not in the top 10.

Karl-Anthony Towns of the Minnesota Timberwolves team, has a total of 31 wins and 51 losses. Being these the minimum of victories of all the predicted players. One thing we can also highlight is that this player was 16th in the quintet positions, with 4 points less than Deandre Jordan who came in 15th.

Gordon Hayward that year made the year in best statistics. It was his only year with more than 20 points per game played. It was the only year he was selected for the NBA All Star.

The boxplot diagram shows a big difference between the two groups, a little even more remarkable than the other years. With averages of 8.76% compared to 95.07%.

Focusing on the outlier we find, we realize that it is Deandre Jordan with a probability of 10.64% and that he occupies the same position as Karl-Anthony Towns (the mistake of before). Deandre Jordan is a player with a very defensive facet, so he did not have very good statistics, but he has a very good reputation in the league. His team scored 20 more victories this year than the Karl-Anthony Towns team, also entering the playoffs at the top of the table.

In the position graph we find it interesting to see how there is no player in the PS position who currently has many changes in the competition. Occupying it to the extent by SF or C players.